@ Department of Political Sciences
University of Naples Federico II
Naples, ITALY
domenicovistocco.it
Google Scholar
ResearchGate
@domenicovistocco
Proposal: regression on latent components as a possible solution to collinearity in QR
Case study: assessment of the quality of service in presence of highly correlated predictors
“Simulation” study: analysis of different degrees of collinearity and various response distributions
When \(\mathbf{X}^\prime \mathbf{X}\) is nonsingular:
The covariance matrix of the LS estimator is: \[ cov(\boldsymbol{\hat{\beta}})= \sigma^{2}(\mathbf{X}^\prime \mathbf{X})^{-1} \] and can be also formulated in terms of the singular value decomposition of the \(\mathbf{X}^\prime \mathbf{X}\) matrix: \[ cov(\boldsymbol{\hat{\beta}})= \sigma^{2}\sum_{a=1}^{A}\mathbf{p}_{a}(1/\lambda_{a})\mathbf{p}_{a}^\prime \] where the \(\mathbf{p}_{a}\)s and \(\lambda_{a}\)s are the eigenvectors and the eigenvalues of \(\mathbf{X}^\prime \mathbf{X}\).
Variance Inflation Factor
\(VIF_{j}=\frac{1}{1-R^{2}_{j}}\)
CN - Condition Number
\(CN=\left(\frac{\hat{\lambda}_1}{\hat{\lambda}_{J}}\right)^{1/2}\)
In a nutshell:
Note
Since the asymptotic distribution of the QR estimator depends on the inverse of the variance covariance matrix, the variance of the QR estimator increases with the degree of correlation among the predictors
subset selection
stepwise selection
Note
due to the different degrees of freedom, a test error estimate is required to make the choice
training error measures adjusted for the model complexity (Mallow \(C_p\), \(AIC\), \(BIC\), adjusted \(R^2\)).
the \(p\) predictors are kept in the model, but the estimates of the coefficient are shrunken towards 0
this also compresses the variance
eventually, some of the coefficients maybe constrained to zero, implicitly excluding the corresponding predictors from the model
the type of constraint defines the shrinkage method
Note
Ridge regression assigns similar coefficients to correlated predictors, whereas lasso regression assigns quite different coefficients to correlated regressors
Methods that construct new \(m\) predictor variables (components) as linear combinations of the original predictor variables
Setting \(m < p\) the model complexity is reduced
Principal components analysis (PCA) is applied to the matrix of predictors \(\mathbf{X}\) in order to extract the \(m\) most dominating principal components \[ \mathbf{X} = \mathbf{TP}^\prime + \mathbf{E} \] where \(\mathbf{T}\) is called scores matrix and collects the \(m\) dimensions responsible for the systematic variation in \(\mathbf{X}\). Then, \[ \mathbf{y} = \mathbf{Tq} + \mathbf{f} \] where \(\mathbf{P}\) and \(\mathbf{q}\) are called loadings and describe how the variables in \(\mathbf{T}\) relates to the original variables in \(\mathbf{X}\) and \(\mathbf{y}\), respectively
The estimated scores \(\mathbf{\hat{T}}\) are used in the regression equation in place of the original predictors, where LSR is used to estimate the regression coefficients in \(\mathbf{q}\), and \(\mathbf{f}\) corresponds to the error term.
The PCR solution, i.e. the loadings \(\mathbf{\hat{P}}\) and the regression coefficients \(\mathbf{\hat{q}}\), can be combined to give the regression equation: \[ \hat{y}=\bar{y} + \mathbf{{X}\hat{P}\hat{q}}, \] which can be interpreted in the same way as a classical LSR and where the intercept is equal to the mean \(\bar{y}\) since the \(\mathbf{X}\) matrix is centered.
The extension of the principal component regression to the context of the QR is straightforward:
the extraction of the main components from the predictor matrix occurs in the same way
the regression of the response variable on the extracted components uses the QR instead of the LSR
The estimated scores matrix \(\mathbf{\hat{T}}\) of QPCR is obtained by minimizing the loss function: \[ ||\mathbf{X} - \mathbf{TP^\prime}||^{2}, \] whose solution is obtained through the SVD of \(\mathbf{X}\)
The estimated scores \(\mathbf{\hat{T}}\) are then used in the regression equation in place of the original predictors \[ Q_{\tau}(\hat{\textbf{y}} \vert \textbf{T})=\textbf{T}\hat{\beta}(\tau), \]
estimation of separate models for different asymmetries \(\tau \in [0, 1]\)
a dense set of quantiles completely characterizes the conditional distribution of the response
the use of PCR allows to obtain a single set of “artificial predictor” that can be used for all the different conditional quantiles
this unburden the interpretation issues that could be complex for the other methods used to face with multicollinearity
QPCR produce the same numerical and graphical outputs as PCR, with the only difference being that the results will be specific for each selected conditional quantile
data set regarding customers of a retail who offers products both online and in–store
data deals with a random sample of 632 customers from a companys customer relationship management system
the collected variables are:
assess whether and to what extent the purchasing behaviour, the level of satisfaction with the seller and the personal characteristics of customers influence purchases made online
assess whether this impact changes according to the amount of money spent
different degrees of correlation among predictors and different types of response to compare LSR and QR
a sample size of 100 observations and 3 relevant predictors (only 1 relevant for prediction)
the population model explains 70% of the variation in the response
the coefficient \(\gamma\) regulated the level of collinearity:
for each value of the \(\gamma\) grid, the standard errors of LSR and QR models were computed using the bootstrap procedure in order to have a fair comparison
1000 simulations for each value in the design grid
different types of responses
| PC1 | PC2 | PC3 | |
|---|---|---|---|
| \(\gamma = 0\) | 36.33 | 33.87 | 29.8 |
| \(\gamma = 0.5\) | 45.44 | 34.16 | 20.4 |
| \(\gamma = 1.0\) | 55.19 | 36.37 | 8.44 |
| \(\gamma = 1.5\) | 65.17 | 30.71 | 4.12 |
| \(\gamma = 2.0\) | 66.17 | 31.93 | 1.9 |
| \(\gamma = 2.5\) | 72.16 | 27.03 | 0.81 |
| \(\gamma = 3.0\) | 85.03 | 14.74 | 0.23 |
| \(\gamma = 3.5\) | 91.79 | 8.12 | 0.09 |
| \(\gamma = 4.0\) | 95.43 | 4.54 | 0.03 |
| \(\gamma = 4.5\) | 97.47 | 2.52 | 0.01 |
| \(\gamma = 5.0\) | 97.87 | 2.13 | 0.00 |
| PC1 | PC2 | PC3 | |
|---|---|---|---|
| \(\gamma = 0\) | 36.33 | 70.20 | 100.00 |
| \(\gamma = 0.5\) | 45.44 | 79.60 | 100.00 |
| \(\gamma = 1.0\) | 55.19 | 91.56 | 100.00 |
| \(\gamma = 1.5\) | 65.17 | 95.88 | 100.00 |
| \(\gamma = 2.0\) | 66.17 | 98.10 | 100.00 |
| \(\gamma = 2.5\) | 72.16 | 99.19 | 100.00 |
| \(\gamma = 3.0\) | 85.03 | 99.77 | 100.00 |
| \(\gamma = 3.5\) | 91.79 | 99.91 | 100.00 |
| \(\gamma = 4.0\) | 95.43 | 99.97 | 100.00 |
| \(\gamma = 4.5\) | 97.47 | 99.99 | 100.00 |
| \(\gamma = 5.0\) | 97.87 | 100.00 | 100.00 |
preliminary results
main limit: focus on the effect of collinearity on standard errors
effect of the collinearity in terms of bias
QCPR evaluation in terms of bias at different locations
and then effect on prediction ability
effect of the collinearity in terms of bias
QCPR evaluation in terms of bias at different locations
and then effect on prediction ability
domenicovistocco.it
Google Scholar
ResearchGate
@domenicovistocco